Tandem mass tag-based quantitative proteomic profiling identifies candidate serum biomarkers of drug-induced liver injury in humans

Diagnosis of drug-induced liver injury (DILI) and its distinction from other liver diseases are significant challenges in drug development and clinical practice. Here, we identify, confirm, and replicate the biomarker performance characteristics of candidate proteins in patients with DILI at onset (DO; n = 133) and follow-up (n = 120), acute non-DILI at onset (NDO; n = 63) and follow-up (n = 42), and healthy volunteers (HV; n = 104). Area under the receiver operating characteristic curve (AUC) for cytoplasmic aconitate hydratase, argininosuccinate synthase, carbamoylphosphate synthase, fumarylacetoacetase, fructose-1,6-bisphosphatase 1 (FBP1) across cohorts achieved near complete separation (range: 0.94–0.99) of DO and HV. In addition, we show that FBP1, alone or in combination with glutathione S-transferase A1 and leukocyte cell-derived chemotaxin 2, could potentially assist in clinical diagnosis by distinguishing NDO from DO (AUC range: 0.65–0.78), but further technical and clinical validation of these candidate biomarkers is needed.


Supplementary Tables
• Supplementary Table 1

Supplementary Figures
Supplementary Figure 1. Comparison of leukocyte cell-derived chemotaxin 2 (LECT2) levels in the discovery cohort. The box and whisker plots represent the levels of LECT2 in the discovery cohort comprising healthy volunteers (HV, n = 10), acute DILI onset (DO, n = 10), DILI follow-up (DF, n = 10), acute non-DILI onset (NDO, n = 5), non-DILI follow-up (NDF n = 5), and chronic nonalcoholic fatty liver disease (NAFLD, n = 10). The centre line in the box corresponds to the median, the box represents the first and third quartiles, and the whiskers represent the minimum and maximum observed values. Source data are provided as a Source Data file.  Figure 4. Development of multivariate models to distinguish DO from HV. a AUC from logistic regression and random forest (RF) predictive, multivariate models including all candidate biomarkers using confirmatory cohort, DO (n = 76) and HV (n = 60). Box plots indicate median (middle line), 25 th , 75 th percentile (box) and 95th percentile (whiskers) as well as outliers (single points). b Variable importance scores for candidate biomarkers based on 500 bootstrapping in RF model from (a); The y-axis represents the importance scores scaled to a maximum score of 100. Box plots indicate median (middle line), distribution of score (box) and 1.5x interquartile range (whiskers) as well as outliers (single points) which may be truncated by axes limits at 0 or 100. c AUC of 4 RF models or panels and each model is described in (d). Box plots indicate median (middle line), 25 th , 75 th percentile (box) and 95th percentile (whiskers) as well as outliers (single points). d AUCs for the 4 models in (c) tested using the replication cohort as an independent validation dataset. Source data are provided as a Source Data file. Figure 5. Development of models to distinguish DO from NDO. a AUC from logistic regression and random forest (RF) predictive, multivariate models including all candidate biomarkers using confirmatory cohort, NDO (n = 32) and DO (n = 76). b and c Variable importance scores for candidate biomarkers based on 500 bootstrapping, logistic regression (b) and RF (c) from NDO and DO in (a). The y-axis represents the importance scores scaled to a maximum score of 100. d AUC for logistic regression and RF models (shown in Table 2) developed based on the best performing biomarkers (Supplemental Tables 4 and 5) Model 1: FBP1+GSTA1; Model 2: FBP1+GSTA1+LECT2; Model 3: FBP1+CES1+LECT2; Model 4: FBP1+LECT2; Model 5: FBP1+LECT2+CPS1. Box plots indicate median (middle line), distribution of score (box) and 1.5x interquartile range (whiskers) as well as outliers (single points) which may be truncated by axes limits at 0 or 100. In (a) and (d), box plots indicate median (middle line), 25th, 75th percentile (box) and 95th percentile (whiskers) as well as outliers (single points). Source data are provided as a Source Data file. Figure 6. Gene signature analysis using the liver cell population for differentially expressed liver enriched proteins. The pathway enrichment scores for pairwise comparison between NDO and DO, NDO and HV, DO and HV are shown to identify up-or downregulated pathways in liver zones. The X-axis represents normalized enrichment scores, calculated by the fgsea package for the pathways shown on the Y-axis. Figure 7. Correlation between ALT activity and candidate biomarkers (HPD, OTC, GSTA1, DMGDH, CES1, LECT2, and PCK2). The individual log2 normalized levels, correlation coefficient and significance levels (two-sided, no adjustment) are shown for confirmatory cohort HV (n = 60) and patients with onset of DILI (DO, n = 82). Source data are provided as a Source Data file.

Supplementary Tables
Supplementary Table 1 ALT, AST, ALP, and TBL markers were used for defining acute DILI or non-DILI as described in methods section of the manuscript. GLDH and CK18 have previously been investigated and identified as promising biomarkers, so were included in our study.CI, confidence interval. How missing data on the index test and reference standard were handled Methods 17

Supplementary
Any analyses of variability in diagnostic accuracy, distinguishing pre-specified from exploratory

18
Intended sample size and how it was determined na RESULTS

Participants 19
Flow of participants, using a diagram Fig 1  20 Baseline demographic and clinical characteristics of participants Table 1 21a Distribution of severity of disease in those with the target condition na 21b Distribution of alternative diagnoses in those without the target condition Table 1  22 Time interval and any clinical interventions between index test and reference standard Where the full study protocol can be accessed Methods 30 Sources of funding and other support; role of funders Acknwledgements